Randomized Language Models via Perfect Hash Functions
نویسندگان
چکیده
We propose a succinct randomized language model which employs a perfect hash function to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters. The scheme can represent any standard n-gram model and is easily combined with existing model reduction techniques such as entropy-pruning. We demonstrate the space-savings of the scheme via machine translation experiments within a distributed language modeling framework.
منابع مشابه
A Simulated Annealing Algorithm for Generating Minimal Perfect Hash Functions
We developed minimal perfect hash functions for a variety of datasets using the probabilistic process of simulated annealing (SA). The SA solution structure is a tree representing an annealed program (algorithm). This solution structure is similar to the structure used in genetic programming. When executed, the SA program produces multiple hash functions for the given data set. An initial hash ...
متن کاملA Finite-State Library for NLP
A library of functions is described which use finite-state automata for compact storage and efficient usage of very large dictionaries and language models. The library can be used to test whether a word is in a dictionary, to perform morphological analysis, to construct perfect hash tables, and to construct and use very large language models (such as models which employ bigram and trigram frequ...
متن کاملGenerating Minimal Perfect Hash Functions
The randomized, deterministic and parallel algorithms for generating minimal perfect hash functions (MPHF) are proposed. Given a set of keys, W, which are character strings over some alphabet, the algorithms using a three-step approach (mapping, ordering, searching) nd the MPHF of the form h(w) = (h0(w) + g(h1(w)) + g(h2(w)))mod m, w 2 W, where h0, h1, h2 are auxiliary pseudorandom functions, m...
متن کاملStream-based Randomised Language Models for SMT
Randomised techniques allow very big language models to be represented succinctly. However, being batch-based they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. We present a novel randomised language model which uses an online perfect hash function to efficiently deal with unbounded text streams. Translation experiments over a text stream...
متن کاملOn the Structure and Complexity of Infinite Sets with Minimal Perfect Hash Functions
This paper studies the class of infinite sets that have minimal perfect hash functions one-to-one onto maps between the sets and E·-computable in polynomial time. We show that all standard NP-complete sets have polynomial-time computable minimal per fect hash functions, and give a structural condition sufficient to ensure that all infinite NP sets have polynomial-time computable minimal perfe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008